Shape Similarity and Visual Parts
نویسندگان
چکیده
Human perception of shape is based on visual parts of objects to a point that a single, significant visual part is sufficient to recognize the whole object. For example, if you see a hand in the door, you expect a human behind the door. Therefore, a cognitively motivated shape similarity measure for recognition applications should be based on visual parts. This cognitive assumption leads to two related problems of scale selection and subpart selection. To find a given query part Q as part of an object C, Q needs to have a correct size with regards to C (scale selection). Assuming that the correct size is selected, the part Q must be compared to all possible subparts of C (subpart selection). For global, contour-based similarity measures, scaling the whole contour curves of both objects to the same length usually solves the problem of scale selection. Although this is not an optimal solution, it works if the whole contour curves are ’sufficiently’ similar. Subpart selection problem does not occur in the implementation of global similarity measures. In this paper we present a shape similarity system that is based on correspondence of visual parts, and apply it to robot localization and mapping. This is a particularly interesting application, since the scale selection problem does not occur here and visual parts can be obtained in a very simple way. Therefore, only the problem of subpart selection needs to be solved. Our solution to this problem is based on a contour based shape similarity measure supplemented by a structural arrangement information of visual parts. 1 Motivation and Overview of Shape Descriptors Shape descriptors for comparing silhouettes of 2D objects in order to determine their similarity are important and useful for wide range of applications, of which the most obvious is shape-based object retrieval in image databases. Shape’s importance is indicated by the fact that the MPEG-7 group incorporated shape descriptors into the MPEG-7 standard. Since the 2D objects are projections of 3D objects their silhouettes may change due to: 1. change of a view point with respect to objects, 2. non-rigid object motion (e.g., people walking or fish swimming), 3. noise (e.g., digitization and segmentation noise). The goal of the Core Experiment CE-Shape-1 [20] was to evaluate the performance of 2D shape descriptors under such conditions. The shapes were restricted to simple pre-segmented shapes defined by their bitmaps. Some example shapes are shown in Figure 1. The main requirement was that the shape descriptors should be robust to small non-rigid deformations due to (1), (2), or (3). In addition the descriptors should be scale and rotation invariant. Fig. 1. Some shapes used in part B of MPEG-7 Core Experiment CE-Shape-1. Shapes in each row belong to the same class. The main part of the Core Experiment CE-Shape-1 was part B: similaritybased retrieval. The data set used for this part is composed of 1400 shapes stored as binary images. The shapes are divided into 70 classes with 20 images in each class. In the test, each image was used as a query, and the number of similar images (which belong to the same class) was counted in the top 40 matches (bulls-eye test). Since the maximum number of correct matches for a single query image is 20, the total number of correct matches is 28000. It turned out that this data set is the only set that is used to objectively evaluate the performance of various shape descriptors. We present now some of the shape descriptors with the best performance on this data set. It is not our goal to provide a general overview of all possible shape descriptors. A good overview can be found in the book by Costa and Cesar [4]. The shape descriptors can be divided into three main categories: 1. contour based descriptors: the contour of a given object is mapped to some representation from which a shape descriptor is derived, 2. area based descriptors: the computation of a shape descriptor is based on summing up pixel values in a digital image of the area containing the silhouette of a given object; the shape descriptor is a vector of a certain number of parameters derived this way (e.g., Zernike moments [13]), 3. skeleton based descriptors: after a skeleton is computed, it is mapped to a tree structure that forms the shape descriptor; the shape similarity is computed by some tree-matching algorithm. The idea of representing shapes by their skeletons in Computer Vision goes back to Blum [3]. Siddiqi et al. [25] also convert object skeletons to a tree representation and use a tree-matching algorithm to determine the shape similarity. In the MPEG-7 Core Experiment CE-Shape-1 part B, shape descriptors of all three categories were used. A general conclusion is that contour based descriptors significantly outperformed the descriptors of the other two categories [20]. It seems to be that area based descriptors are more suitable for shape classification than for indexing. The week performance of skeleton based descriptors can probably be explained by unstable computation of skeletons related to discontinuous relation between object boundary and skeletons. A small change in the object boundary may lead to a large change in the skeleton. As reported in [20], the best retrieval performance of 76.45% for part B was obtained for shape descriptor of Latecki and Lakaemper [17], that will be described in this paper, (presented by the authors in cooperation with Siemens Munich) followed by shape descriptor of Mokhtarian et al. [22, 23] with retrieval rate of 75.44% (presented by Mitsubishi Electric ITE-VIL). It is important to mention that 100% retrieval rate on this data set is not possible to achieve employing only shape. The classification of the objects was done by human subjects, and consequently, some shapes can be only correctly classified when semantic knowledge is used. Meanwhile new shape descriptors have been developed that yield a slightly better performance. The best reported performance on this data set is obtained by Belongie et al. [2], 76.51%. The small differences in the retrieval rate of these approaches are more likely to indicate a better parameter tuning than a better approach. All the contour based shape descriptors have a common feature that limits their applicability. They require a presence of the whole contours to compute shape similarity. Although they are robust to some small distortions of contours, they will fail if a significant part of contour is missing or is different. The same critique applies to area and skeleton based shape descriptors that require the whole object area or the complete skeleton to be present. The goal of this paper is to direct our attention to a cognitively motivated ability of shape descriptors and the shape similarity measures that is necessary for most practical applications of shape similarity. It is the ability of partial matching. Partial matching leads to two related problems of scale selection and subpart selection. To find a given query part Q as part of an object C, Q needs to have a correct size with regards to C (scale selection). Assuming that the correct size is selected, the part Q must be compared to all possible subparts of C (subpart selection). The subparts may be obtained either by a decomposition of Q into parts using some decomposition criterion or simply by sliding Q over all possible positions with respect to C, e.g., the beginning point of Q is aligned with each point of C. A good example of an approach that allows for partial matching is a singledirectional Hausdorff distance [12], which tries to minimize the distance of all points of the query part Q to points of object C. However, the problem of scale selection cannot be solved in the framework of Hausdorff distance alone. For example, the approach presented in [12] simply enumerates all possible scales. Moreover, the Hausdorff distance does not tolerate shape deformations that preserve the structure of visual parts, i.e., the objects differing by such deformations although very similar to humans will have a large similarity value. For global, contour-based similarity measures, scaling the whole contour curves of both objects to the same length usually solves the problem of scale selection. Although this is not an optimal solution, it works if the whole contour curves are ’sufficiently’ similar. Subpart selection problem does not occur in the implementation of global similarity measures. To our knowledge, there does not exist an approach to partial shape similarity that also solves the scaling problem. In this paper we show that the shape descriptor presented by Latecki and Lakaemper [17] can be easily modified to perform partial matching when the scale is known. An ideal application where this restriction is satisfied is robot localization and mapping using laser range data. Therefore, we apply our shape similarity measure in this context. 2 Shape representation, simplification, and matching For a successful shape-representation we need to account for arbitrary shapes. Any kind of boundary information obtained must be representable. Therefore, we will use polygonal curves as boundary representation. We developed a theory and a system for a cognitively motivated shape similarity measure for silhouettes of 2D objects [17, 18, 16]. To reduce influence of digitization noise as well as segmentation errors the shapes are first simplified by a novel process of discrete curve evolution which we introduced in [16, 19]. This allows us • (a) to reduce influence of noise and • (b) to simplify the shape by removing irrelevant shape features without changing relevant shape features. A few stages of our discrete curve evolution are shown in Figure 2. The discrete curve evolution is context sensitive, since whether shape components are relevant or irrelevant cannot be decided without context. In [16], we show that the discrete curve evolution allows us to identify significant visual parts, since significant visual parts become maximal convex arcs on an object contour simplified by the discrete curve evolution. Let P be a polyline (that does not need to be simple). We will denote the vertices of P by V ertices(P ). A discrete curve evolution produces a sequence of polylines P = P , ..., P such that |V ertices(P)| ≤ 3, where | . | is the Fig. 2. A few stages of our discrete curve evolution. cardinality function. Each vertex v in P i (except the first and the last if the polyline is not closed) is assigned a relevance measure that depends on v and its two neighbor vertices u,w in P : K(v, P ) = K(u, v, w) = |d(u, v) + d(v, w)− d(u,w)|, (1) where d is the Euclidean distance function. Note that K measures the bending of P i at vertex v; it is zero when u, v, w are collinear. The process of discrete curve evolution (DCE) is very simple: – At every evolution step i = 0, ...,m−1, a polygon P i+1 is obtained after the vertices whose relevance measure is minimal have been deleted from P . For end vertices of open polylines no relevance measure is defined, since the end vertices do not have two neighbors. Consequently, end-points of open polylines remain fixed. Note that P i+1 is obtained from P i by deleting such a vertex that the length change between P i and P i+1 is minimal. Observe that relevance measure K(v, P ) is not a local property with respect to the polygon P = P , although its computation is local in P i for every vertex v. This implies that the relevance of a given vertex v is context dependent, where the context is given by the adaptive neighborhood of v, since the neighborhood of v in P i can be different than its neighborhood in P . The discrete curve evolution has also been successfully applied in the context of video analysis to simplify video trajectories in feature space [6, 15]. DCE may be implemented efficiently. Polyline’s vertices can be represented within a double-linked polyline structure and a self-balancing tree simultaneously. Setting up this structure for a polyline containing n vertices has the complexity of O(n log n). A step within DCE constitutes of picking out the least relevant point (O(log n)), removing it (O(log n)), and updating it’s neighbor’s relevance measures (O(1)). As there are at most n points to be deleted, this yields an overall complexity of O(n log n). As it is applied to segmented polylines, the number of vertices is much smaller than the number of points read from the sensor. To compute our similarity measure between two polygonal curves, we establish the best possible correspondence of maximal convex arcs. To achieve this, we first decompose the polygonal curves into maximal convex subarcs. Since a simple one-to-one comparison of maximal convex arcs of two polygonal curves is of little use, due to the facts that the curves may consist of a different number of such arcs and even similar shapes may have different small features, we allow for 1-to-1, 1-to-many, and many-to-1 correspondences of the maximal convex arcs. The main idea here is that we have at least on one of the contours a maximal convex arc that corresponds to a part of the other conour composed of adjacent maximal convex arcs. In this context the corresponding parts of contours can be identified with visual object parts. The best correspondence of the visual object parts, i.e., the one yielding the lowest similarity measure, can be computed using dynamic programming, where the similarity of the corresponding visual parts is as defined below. Using dynamic programing, the similarity between corresponding parts is computed and aggregated. The computation is described extensively in [17]. The similarity induced from the optimal correspondence of polylines C and D will be denoted S(C,D). Two example correspondences obtained by our approach are shown in Fig. 3. Since our shape matching technique is based on correspondence of visual parts, it will also work under a moderate amount of occlusion and/or segmentation errors. Fig. 3. The corresponding arcs are labeled by the same numbers. Basic similarity of arcs is defined in tangent space. Tangent space, also called turning function, is a multi-valued step function mapping a curve into the interval [0, 2π) by representing angular directions of line-segments only. Furthermore, arc lengths are normalized to 1 prior to mapping into tangent space. This representation was previously used in computer vision, in particular, in [1]. Denoting the mapping function by T , the similarity gets defined as follows:
منابع مشابه
Shape Similarity Measure Based on Correspondence of Visual Parts
ÐA cognitively motivated similarity measure is presented and its properties are analyzed with respect to retrieval of similar objects in image databases of silhouettes of 2D objects. To reduce influence of digitization noise, as well as segmentation errors, the shapes are simplified by a novel process of digital curve evolution. To compute our similarity measure, we first establish the best pos...
متن کاملPart-based Representations of Visual Shape and Implications for Visual Cognition
Human vision organizes object shapes in terms of parts and their spatial relationships. Converging experimental evidence suggests that parts are computed rapidly and early in visual processing. We review theories of how human vision parses shapes. In particular, we discuss the minima rule for finding part boundaries on shapes, geometric factors for creating part cuts, and a theory of part salie...
متن کاملVisual Shape and Implications for Visual Cognition
Human vision organizes object shapes in terms of parts and their spatial relationships. Converging experimental evidence suggests that parts are computed rapidly and early in visual processing. We review theories of how human vision parses shapes. In particular, we discuss the minima rule for finding part boundaries on shapes, geometric factors for creating part cuts, and a theory of part salie...
متن کاملLearning Descriptive and Distinctive Parts of Objects with a Part-Based Shape Similarity Measure
In this paper we present a novel approach to learn visual parts of objects. The learned set of visual parts is optimized to yield an optimal performance for recognition of new objects. We divide the learning strategy into two main steps that follow our proposed, cognitively motivated principles of learning descriptive and distinctive parts of objects. We first extract the most dissimilar repres...
متن کاملSHREC'08 entry: 3D shape searching using object partitioning
In this paper we propose a novel algorithm for 3D shape searching based on the visual similarity by cutting the object into sections. This method rectifies some of the shortcomings of the visual similarity based methods, so that it can better account for concave areas of an object and parts of the object not visible because of occlusion. As the first step, silhouettes of the 3D object are gener...
متن کامل